-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
peft_config & is_loaded_in_4bit check added to Reward_Trainer #2427
Conversation
if not isinstance(peft_config, PeftConfig): | ||
raise ValueError( | ||
"If you want to use the PeftModel, you need to pass a valid PeftConfig object to the RewardTrainer." | ||
f" and you passed a {type(peft_config)}." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not isinstance(peft_config, PeftConfig): | |
raise ValueError( | |
"If you want to use the PeftModel, you need to pass a valid PeftConfig object to the RewardTrainer." | |
f" and you passed a {type(peft_config)}." | |
) | |
I don't think it's necessary here. The code will eventually fail if the peft config doesn't have the right type
Thanks for this addition @shirinyamani! Is there a simple way to test it? Ideally a small piece of code that would fail before this PR but passes after? |
The simplest way might be testing it with
|
To clarify: I think it's okay not to add a test in our unit test for this PR (because it's specific to multi-gpu configuration, and it's not trivial to set up with GitHub actions, but we'll do it anyway in the future). However, we should check “by hand” that it works. Do you have a small command line / example script that I can run in my local multi-gpu configuration to check that it works as expected? |
Btw if you don't have multi-gpu setting available, feel free to share a code that might not be correct, I can test it quickly on my side. |
Rn I can think of two approaches both when we do NOT want to run manually using
|
What does this PR do?
This PR adds the peft_config to the reward modeling to allow user to pick a peft_config of choice, e.g. Lora, Dora, etc. It also checks for
is_loaded_in_4bit
and make sure the user DO NOT use QLoRA + FSDP to avoid conflicr inprepare_model_for_kbit_training
orpeft_module_casting_to_bf16
that assume access to the full model!Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.